In [ ]:
seaborn:
it is a python library which is used to build interactive vizulaizations
this library makes us easy to understand the statistical representation
we can easily understand the patterns and relationships inside the data
with the help of this vizualizations we can easily take some data driven decisions
In [ ]:
seaborn has different categories while plotting the data
categorical plots
distribution plots
regression plots
matrix plots
relational plots
multi plot grids
In [ ]:
categorical plots:
it is used to vizualize the data on different categories
barplot()
boxplot()
violinplot()
countplot()
stripplot()
factorplot()
In [ ]:
distribution plots:
whenever we want to see the distributions between univaraite and bivariate analysis
jointplot
distplot
pairplot
rugplot
In [ ]:
regression plot:
it is used to show the data columns relationships in a linear way
lmplot
regplot
In [ ]:
matrix plot:
it is used to show the data in a matrix form
heatmap
In [ ]:
relational plot:
it is used to show the relationships between two variables
relplot()
scatterplot()
lineplot()
In [ ]:
multi plot grids:
it is used to vizualize multiple instances of the same plot on different subsets of data
facetgrid
In [ ]:
barplot:it is used to display the aggregated values(mean,min,max) for different categories of data
lineplot:it is used to display the relationship between two numeric columns in a continuous line
scatterplot:it is used to show the relationship between two columns(input/output)
countplot:it is used to show the count of observations in each category
histogram:it is used to show the distributions of the data
kde:it is used to show the kernel density estimator how the curve of the data is representating the viz
boxplot:used to display the outliers inside the data
violinplot:combination of boxplot and kde plot
heatmap:it is used to show the correlation of the data in a matrix form
pairplot:used to show all the numeric columns(scatterplot,kde,hist)
lmpoint:scatter plot with regression line
In [ ]:
pip install seaborn
In [1]:
import seaborn as sns
In [3]:
import numpy as np
import pandas as pd
# Set random seed for reproducibility
np.random.seed(42)
# Create synthetic data
n = 300
data = pd.DataFrame({
"Date": pd.date_range(start="2024-01-01", periods=n, freq="D"),
"Region": np.random.choice(["North", "South", "East", "West"], n),
"Category": np.random.choice(["Electronics", "Clothing", "Home", "Sports"], n),
"Sales": np.random.normal(500, 120, n).round(2),
"Profit": np.random.normal(80, 30, n).round(2),
"Quantity": np.random.randint(1, 20, n),
"Discount": np.random.uniform(0, 0.3, n).round(2),
"Customer_Age": np.random.randint(18, 65, n)
})
In [5]:
data
Out[5]:
| Date | Region | Category | Sales | Profit | Quantity | Discount | Customer_Age | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2024-01-01 | East | Electronics | 505.47 | 78.02 | 4 | 0.05 | 25 |
| 1 | 2024-01-02 | West | Electronics | 421.81 | 43.67 | 18 | 0.26 | 39 |
| 2 | 2024-01-03 | North | Home | 757.27 | 60.44 | 5 | 0.07 | 61 |
| 3 | 2024-01-04 | East | Clothing | 576.07 | 81.42 | 16 | 0.29 | 53 |
| 4 | 2024-01-05 | East | Clothing | 256.98 | 54.19 | 1 | 0.10 | 22 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 295 | 2024-10-22 | West | Sports | 505.77 | 64.31 | 14 | 0.30 | 42 |
| 296 | 2024-10-23 | South | Electronics | 531.17 | 67.39 | 11 | 0.20 | 26 |
| 297 | 2024-10-24 | East | Home | 391.48 | 71.55 | 18 | 0.17 | 19 |
| 298 | 2024-10-25 | North | Sports | 576.63 | 39.67 | 12 | 0.22 | 32 |
| 299 | 2024-10-26 | West | Clothing | 300.62 | 52.44 | 12 | 0.14 | 20 |
300 rows × 8 columns
In [7]:
sns.histplot(data["Sales"],kde=True)
Out[7]:
<Axes: xlabel='Sales', ylabel='Count'>
In [11]:
sns.boxplot(x="Category",y="Sales",data=data)
Out[11]:
<Axes: xlabel='Category', ylabel='Sales'>
In [15]:
sns.barplot(x="Region",y="Profit",data=data)
Out[15]:
<Axes: xlabel='Region', ylabel='Profit'>
In [17]:
sns.scatterplot(x="Discount",y="Sales",data=data)
Out[17]:
<Axes: xlabel='Discount', ylabel='Sales'>
In [23]:
sns.countplot(x="Region",data=data)
Out[23]:
<Axes: xlabel='Region', ylabel='count'>
In [27]:
sns.jointplot(data)
Out[27]:
<seaborn.axisgrid.JointGrid at 0x1a47ad158b0>
In [29]:
sns.pairplot(data=data)
Out[29]:
<seaborn.axisgrid.PairGrid at 0x1a47ab6cec0>
In [ ]:
what is seaborn how it is different from matplotlib?
how to install seaborn library?
what is the difference between countplot() and barplot()
how do you create a boxplot in seaborn?
what is the use of hue parameter?
what is kdeplot?
what is the use of jointplot?
what is pairplot() what is the importance of pairplot?
what is the difference between distplot() and histplot()?
how do you create heatmap?
what are the different categories available in seaborn library?
how do we create a histplot?
how do you vizulaize the correlation matrix?
In [ ]:
scenerio based questions:
1.sales analysis
you need to create a sales data where the data should contains order date
you have monthly sales for 5 years
which plot will use to show the trends?
how will we compare sales across different regions?
2.Hr dataset
we want to check salary distribution by department
which plot you will use and why?
how to check the outliers in the dataset?
what is the importance of outliers and how it will impact on ml models?
3.health care dataset
cancer dataset
how many people affected with cancer?
gender(male/female)---cancer affecting gender
age(child/adult)---cancer affecting age group
4.banking dataset
you want to check correlation between the features
how will you vizualize correlation?
which plot should be use and why?
what is correlation?
how you can understand whether your data is strongly correlated or not
In [ ]:
dashboard---multiple plots
streamlit ---web application
dashboard with the help of library i.e plotly
In [ ]:
plotly:it is one of the library which is used to vizualize the data in more interactive way
In [ ]:
pip install plotly
In [31]:
import plotly.express as px
In [33]:
d1=px.bar(data,x="Category",y="Sales",color="Region")
In [35]:
In [ ]: